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A novel coronavirus is the causative agent of the current epidemic of severe acute respiratory 
syndrome (SARS). Coronaviruses are exceptionally large RNA viruses and employ complex 
regulatory mechanisms to express their genomes. Here, we determined the sequence of SARS 
coronavirus (SARS-CoV), isolate Frankfurt 1, and characterized key RNA elements and protein 
functions involved in viral genome expression. Important regulatory mechanisms, such as the 
(discontinuous) synthesis of eight subgenomic mRNAs, ribosomal frameshifting and post- 
translational proteolytic processing, were addressed. Activities of three SARS coronavirus 
enzymes, the helicase and two cysteine proteinases, which are known to be critically involved in 
replication, transcription and/or post-translational polyprotein processing, were characterized. 
The availability of recombinant forms of key replicative enzymes of SARS coronavirus should pave 
the way for high-throughput screening approaches to identify candidate inhibitors in compound 
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INTRODUCTION 


Severe acute respiratory syndrome (SARS) is a life- 
threatening form of pneumonia (Peiris et al., 2003a). In 
the course of a few months in 2003, an epidemic emerged 
that has spread from its likely origin in Guangdong Province, 
China, to 32 countries. By 11 June 2003 more than 8400 
cases and 789 deaths had been recorded by the World 
Health Organization. The rapid transmission by aerosols 
(and probably also the faecal—-oral route) and the high 
mortality rate make SARS a global threat for which no 
efficacious therapy is available. There is now clear evidence 
that SARS is caused by a previously unknown coronavirus, 
provisionally termed SARS coronavirus (SARS-CoV) (Peiris 
et al., 2003b; Drosten et al., 2003; Ksiazek et al., 2003; 
Fouchier et al., 2003). Genome sequences of SARS-CoV 
isolates obtained from a number of index patients have 
been published recently and provide important information 
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on the organization, phylogeny and variability of the 
29-7 kb positive-strand RNA genome of SARS-CoV (Rota 
et al., 2003; Marra et al., 2003; Ruan et al., 2003). By analogy 
with other coronaviruses (Lai & Holmes, 2001; Siddell, 
1995; Gorbalenya, 2001), SARS-CoV gene expression is 
expected to involve complex transcriptional, translational 
and post-translational regulatory mechanisms, whose mole- 
cular details remain to be determined. SARS-CoV genome 
expression starts with the translation of two large replicative 
polyproteins, ppla (486 kDa) and pplab (790 kDa), which 
are encoded by the viral replicase gene (21221 nt) that 
comprises ORFs la and 1b (Fig. 1). Expression of the 
ORF 1b-encoded region of pplab is predicted to involve 
ribosomal frameshifting into the —1 frame just upstream of 
the ORFla translation termination codon (Brierley et al., 
1989). The ppla and pplab polyproteins are processed by 
viral proteinases to yield the functional components of the 
membrane-bound replicase complex (Ziebuhr et al., 2000). 
In contrast to most other coronaviruses, which use three 
proteinase activities for replicase polyprotein processing 
(Ziebuhr et al., 2000; Gorbalenya, 2001), SARS-CoV is 
predicted to encode only two proteinases (Rota et al., 2003; 
Snijder et al., 2003). The replicase complex mediates both 
genome replication and transcription of a ‘nested’ set of 
subgenomic mRNAs. These mRNAs encode the structural 
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Fig. 1. SARS-CoV genome organization and expression. (a) The SARS-CoV ORFs, frameshift (FS) and TRS elements, and 
genomic and subgenomic mRNAs, are shown. Black boxes represent the 72 nt leader RNA sequence located at the 5’ end of 
each viral mRNA. Also indicated are the viral proteins predicted to be expressed from a given mRNA’s ‘unique’ region (i.e. the 
region not present on smaller mRNAs). (b) Northern blot analysis of poly(A)-containing RNA from SARS-CoV infected Vero 
(lane 1) and Vero E6 (lane 2) cells. A °?P-labelled probe corresponding to the 3’-terminal 794 nt of the SARS-CoV genome 
was used to detect genomic and subgenomic SARS-CoV mRNAs. Poly(A)-containing RNA from HCoV-infected MRC-5 cells, 
which was hybridized with a °?P-labelled probe specific for HCoV nt 26297-27273, was used as size marker (lane 3). 
(c) Alignment of SARS-CoV TRS elements identified by RT-PCR amplification and sequencing. Nucleotides matching the 
leader TRS are underlined. These sequences represent the leader-to-body fusion sites of subgenomic RNAs. The minimal 
TRS (5’-ACGAAC-3’) present in all functional TRSs is highlighted in grey. (d) Strategy used to identify the leader-to-body 
fusion sites of SARS-CoV subgenomic mRNAs. As an example, the RT-PCR amplification and sequence analysis of the fusion 
site of mRNA 3 is shown. The reverse transcription reaction was primed using an oligonucleotide specific for the body 
sequence. For the subsequent PCR, a leader-specific oligonucleotide was used in combination with a second, body-specific 
oligonucleotide. Sequence analysis of the resulting PCR product is shown in the bottom panel, with the TRS core sequence 
and translation initiation codon indicated. 
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proteins, S, E, M and N, and a set of accessory proteins 
whose number and sequence vary among different corona- 
virus species (Siddell, 1995). The extraordinary size of the 
coronavirus replicase (poly)proteins, their generally large 
phylogenetic distance from those of other RNA viruses, and 
the presence of several predicted RNA processing activities 
which are not found in other positive-strand RNA viruses 
(Gorbalenya, 2001; Snijder et al., 2003), indicate that 
coronavirus replicases are of an unparalleled complexity. 
The underlying biological mechanisms and functional con- 
straints that determine the evolution and conservation of 
these unique activities remain to be elucidated. 


In this study, we used the Frankfurt 1 isolate, the sequence of 
which we report here, to characterize critical steps of SARS- 
CoV gene expression, such as synthesis of subgenomic 
mRNAs, translation of key replicative proteins by frame- 
shifting and post-translational processing. Proteolytic pro- 
cessing by viral proteinases provides the active components 
of the viral replicase complex. The results we obtained were 
used to rank possible targets for therapeutic intervention in 
SARS and other coronavirus infections. 


METHODS 


Preparation of SARS-CoV RNA. Vero or Vero E6 cells (1 x 10’) 
were infected with SARS-CoV (Drosten et al., 2003) (isolate 
Frankfurt 1, fifth passage in cell culture) at an m.o.i. of 0-01. Two 
days after infection, intracellular poly(A)-containing RNA was pre- 
pared as described by Thiel et al. (2001). RNA isolated from respira- 
tory tract and stool specimens was prepared using the QlAamp and 
QIAamp stool kit (Qiagen), respectively, according to the manufac- 
turer’s instructions. 


Sequencing of the SARS-CoV (Frankfurt 1) RNA genome. To 
determine the SARS-CoV genomic sequence, a set of overlapping 
RT-PCR products with an average size of 2 kb encompassing the 
entire genome was generated as described by Thiel et al. (1997). To 
generate RT-PCR products containing the exact 3’-terminal sequence 
of SARS-CoV genomic RNA, reverse transcription was primed using 
oligonucleotide OLV1/57 (5'-GCCGGCGCCAGCGAGGAGGCTGG- 
GACCATGCCGGCCTTTTTTTITITTTITTTT-3’) and PCR was done 
using oligonucleotides PCR-L (5’-73GGAAAAGCCAACCAACCTC- 
GATCTC,7-3') and OLV1/58 (5'-ACGTTCTAGAGCCCAGCCGGCG- 
CCAGCGAGGAGGCT-3’). To generate RT-PCR products containing 
the exact 5’-terminal sequence the FirstChoice RLM-RACE Kit 
(Ambion) was used according to the manufacturer’s instructions 
with the following modifications. A synthetic RNA corresponding to 
human coronavirus 229E (HCoV-229E) nt 1-600 was used as RNA 
adapter and reverse transcription was primed using oligonucleotide 
$65 (5’-439 CTTTTTCCAGCTCTACTAGACCAC,j6-3'). Outer PCR was 
done using oligonucleotides 240up (5’-CCTTACTCGAGGTTCCG- 
TCTCGTG-3’) and S132 (5'-34;ACGTCTCTAACCTGAAGGACA- 
GGC3 8-3’); inner PCR was done using oligonucleotides Oli5 
(5'-GCGAGGCCGCTAGCAATGG-3’) and $133 (5'-3;CTAGGTA- 
TGCTGATGATCGACTGC)),-3’). All RT-PCR products served as 
template for sequencing analysis using a total number of 149 
sequencing primers and the BigDye Terminator v3.1 Cycle 
Sequencing Kit. Sequencing products were detected using an ABI 
PRISM 3100 Genetic Analyser (Applied Biosystems) and computer- 
assisted analysis of sequencing data was facilitated by the Lasergene 
bio-computing software (DNASTAR). 


Analysis of SARS-CoV mRNAs. Poly(A)-containing RNA from 
SARS-CoV- and HCoV-infected cells was separated on a 2-2 M 
formaldehyde/1% agarose gel, blotted on nylon membrane and 
hybridized with **P-multiprime-labelled DNA probes corresponding 
to the 3’-terminal 794 nt of the SARS-CoV genome and HCoV nt 
26297-27273, respectively. RNAs were analysed by autoradiography. 
To determine the leader-to-body fusion sites of SARS-CoV sub- 
genomic mRNAs, reverse transcription of poly(A)-containing RNA 
from SARS-CoV-infected cells was primed using oligonucleotides 
RT-S._ (5!-39947ATAGGCTGCAGCTGACGTGCCCCA39994-3'), RT-3 
(5! ~ysg9sGTTTTGGIGTTGAAATGCCGTCACC) 5781-3"), RT-E (5's¢394T- 
TAACACGCGAGTAGACGTAAACCG¢080-3")s, RT-M. (5'-2696gAT- 
CAGTGCCTACACGCTGCGACGCo¢941-3’)s RT 6-8 (5’->g043ACAC- 
CTAGCTATAAGCGCACCACCoe999-3’) and RT-N (5!-yg79gTGTC- 
TAGCAGCAATAGCGCGAGGG(C}774-3'). PCR amplification was 
done using the SARS-CoV leader-specific oligonucleotide PCR-L in 
combination with body-specific oligonucleotides PCR-S (5'-2219gAC- 
TACATCTATAGGTTGATAGCCCT 5 984-3")s PCR-3 (5'~5734T AGT- 
CATAGTTATGTGTGTGCCAGC)5719-3’)) PCR-E. (5'~26243AGTAC- 
GCACACAATCGAAGCGCAG96999-3’)s PCR-M (5'-367g3 CACAATT- 
GTCCCCCGGAGAGGCAC 6758-3"), PCR-6-8 (5'-27937 CCTAGAGC- 
ACAAAGCCAAGCAGTGC)7913-3") and PCR-N (5'-2g60gAGGAAG- 
TTGTAGCACGGTGGCAGC ¢5g5-3’). Sequence analysis of PCR 
products was done using primers SEQ-S (5'-31733AAGGTATGACA- 
GGGTTGCCAAACG) 1715-3’), SEQ-3 (5'-255;sAAATTGCAAATGA- 
ACTGGAAGCCCo5492-3"), SEQ-E (5'-26243AGTACGCACACAATC- 
GAAGCGCAG96939-3")s SEQ-M (5'-26640AATCGCAATCCCGCCAG- 
TCACCC 6618-3"), SEQ-6 (5'-27244AGGTTCTTCATCATCTAACTCC- 
GAgo7901-3'), SEQ-7 (5'~27439AGTGCAAATTTATTGTCAGCAAGA 7416" 
3'), SEQ-8 (5'-27937 CCTAGAGCACAAAGCCAAGCAGTGC57913-3") 
and SEQ-N (5’-g39gTCGGGTAGCTCTTCGGTAGTAGCCy 9375-3’). 


In vitro transcription and translation. In vitro transcription reac- 
tions were done using the RiboMAX Large Scale RNA Production 
System—T7 (Promega) and m’G(5’)ppp(5')G cap structure analogue 
as described by Thiel et al. (2001). In vitro translation reactions were 
done in rabbit reticulocyte lysate (Promega) using in vitro- 
transcribed RNA (Ziebuhr & Siddell, 1999). Alternatively, DNA 
templates containing a T7 promoter were transcribed and translated 
using the TNT T7-Coupled Reticulocyte Lysate System (Promega). 
To analyse the SARS-CoV frameshifting element, a DNA fragment 
corresponding to SARS-CoV nt 12955-13961 was amplified by RT- 
PCR using oligonucleotides $25 (5'-175;;CAATTTCAGCAGGACAA- 
CGGCGAC) 748-3’; RT reaction), JZ464 (5'-AATAATACGACTCA- 
CTATAGGGAACCATGGCTGGAAATGCTACAGAAGTACCTGC-3’, 
PCR sense primer) and JZ465 (5'-AAAGAATTCTTAACGCATAGC- 
ATCGCAGAATTGTAC-3’; PCR antisense primer). To generate a DNA 
fragment with mutated ‘slippery’ sequence (13392 JGUAGCC}3398), two 
PCRs were done using oligonucleotides $59 (5'-1291¢6ATGGTGCTG- 
GGCAGTTTAGCTGCT j939-3’; PCR 1, sense primer), JZ467 (5’- 
AAAAGGTCTCCGGCTAGAAACGTTGATGCATCCGCAGAC-3’; PCR 1, 
antisense primer), JZ466 (5'-AAAAGGTCTCTAGCCGGGTTTGCG- 
GTGTAAGTGCAGCCC-3’; PCR 2, sense primer) and S91 (5’-14939- 
TACGAAATCACCGAAATCGTACCA 1 4916-3"; PCR 2, antisense primer). 
PCR products 1 and 2 were cleaved with Bsal restriction endonu- 
clease and ligated using T4 DNA ligase. The ligation product was 
used as template to amplify a DNA fragment corresponding to 
SARS-CoV nt 12955-13961 (with mutated ‘slippery’ sequence) using 
oligonucleotides JZ464 and JZ465. 


For in vitro transcription—translation of the 3CL™° substrate repre- 
senting nsp7—nsp10 of SARS-CoV (Ser-3837—Gln-4369), the corres- 
ponding coding sequence of SARS-CoV was amplified using primers 
JZA33 (5'-AATAATACGACTCACTATAGGGCGAACCATGTCTAAAA- 
TGTCTGACGTAAAGTGCA-3’) and JZ434 (5'’-AAAGAATTCTTACT- 
GCATCAAGGGTTCGCGGAGTTG-3’). The upstream primer contained 
a T7 RNA polymerase promoter. For in vitro transcription—translation 
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of the ppla/pplab sequence Lys-737-Ser-1858, containing the PL2?"° 
domain and the presumed nsp2I3 cleavage site, the corresponding 
coding sequence was amplified by RT-PCR using primers AP91 (5’- 
TAATACGACTCACTATAGGGACGGGAACACCATGGCAAAAGA- 
AGTAACCTTTCTTGAAGGT-3’) and $77 (5’-sg39ACGACACAGGC- 
TTGATGGTTGTAGs@ 16-3’). To introduce a PL2?"° active-site muta- 
tion (Cys-1651 to Ala) in this sequence, two PCRs were done using (1) 
primers AP91 and AP94 (5’-ATAGCTCTTCATGCATTGTTATCAG- 
CCCATTTAATTGA-3’) and (2) AP 95 (5’-ATAGCTCTTCAGCATA- 
TITGTCTAGTGTTTTATTAGCA-3’) and $77. Following digestion of 
the PCR products obtained with SapI and ligation with T4 DNA ligase, 
the ppla/pplab coding sequence Lys-737—Ser-1858 was re-amplified 
using the ligation product as a template and primers AP91 and S77. The 
sequences of PCR products used as templates for in vitro transcription 
were confirmed by nucleotide sequencing. 


Protein expression, purification, and activities. Plasmid con- 
struction, expression in Escherichia coli and purification of the 
maltose-binding protein (MBP) fusion proteins MBP-HCoV 3CL?® 
and MBP-TGEV (porcine transmissible gastroenteritis virus) 3CL?"°, 
and the corresponding active-site mutants, MBP-HCoV 3CLP°_ 
C3109V and MBP-TGEV 3CL?’°_C3022A, have been described 
previously (Ziebuhr et al., 1995, 1997; Hegyi et al., 2002). The same 
approach was taken to express the SARS-CoV ppla/pplab amino 
acids 3241-3545 (i.e. the SARS-CoV 3CL?" domain lacking the two 
C-terminal residues, Phe-3545 and Gln-3546) and the corresponding 
active-site Cys-3385-to-Ala mutant. The coronavirus proteinases 
were released from MBP by factor Xa cleavage and used, according 
to previously published protocols (Ziebuhr & Siddell, 1999), in 
trans-cleavage assays with in vitro-translated substrate or 0-5 mM of 
synthetic 15-mer peptides whose sequences were derived from the 
N-terminal TGEV and mouse hepatitis virus (MHV) 3CL?”® auto- 
processing sites (Seybert et al., 1997; Hegyi & Ziebuhr, 2002). The 
SARS-CoV helicase (SARS-CoV HEL, pplab residues Ala-5302—Gln- 
5902) and a control protein, SARS-CoV HEL_KA, in which the con- 
served Lys of the Walker A box (SARS-CoV pplab Kssg9) was replaced 
by Ala, were expressed and purified in a similar way. Briefly, the 
helicase-coding region was amplified by RT-PCR using primers 
JZ425 (5'-GCTGTAGGTGCTTGTGTATTGTGC-3’) and JZ426 (5’- 
AAAACTGCAGTTATTGTAATGTAGCCACATTGCGACGTGG-3’). 
The PCR product was digested with PstI and inserted in Xmnl- and 
PstI-digested pMal-c2 DNA (New England Biolabs). The mutation 
was introduced using a PCR-in vivo recombination method (Yao 
et al., 1992). Expression and purification of MBP-HEL and MBP- 
HEL_KA were done essentially as described for the HCoV-229E 
helicase (Heusipp et al., 1997). The partially double-stranded DNA 
substrate used in the unwinding assay was produced by annealing 
oligonucleotides D2 [5'-GGTGCAGCCGCAGCGGTGCTCG-d(pT)30-3’] 
and [a-**P]ATP-labelled D3 [5’-d(pT)39-CGAGCACCGCTGCGGC- 
TGCACC-3’] as described by Seybert et al. (2000a). The unwinding 
reaction was done for 30 min at 25°C in buffer A (HEPES/KOH, 
pH 7-4, 10% glycerol, 5 mM magnesium acetate, 2 mM dithiothrei- 
tol and 0-1 mg BSA ml~') using 10 nM of substrate and various 
concentrations of MBP—HEL (8, 80 and 800 nM) and MBP—HEL_ 
KA (800 nM), respectively. The reaction products were analysed on 
polyacrylamide/TBE gels, which were exposed to X-ray film. ATPase 
reactions were done in buffer A for 5 min at 25°C using the follow- 
ing concentrations: MBP-HEL and MBP-HEL_KA each at 0-8 nM, 
10 uM [a-**P]ATP, 1 1M poly(U)250 (when included). The samples 
were analysed by polyethyleneimine—cellulose thin-layer chromato- 
graphy with 0-25 M potassium phosphate, pH 4-0, as the liquid 
phase. The reaction products were quantified by phosphorimaging 
of the dried chromatographic plates (ImageQuant software, Molecular 
Dynamics). 


RESULTS AND DISCUSSION 


Genomic sequence of SARS-CoV Frankfurt 1 


Here we report the complete genome sequence of a SARS- 
CoV isolate, Frankfurt 1, obtained from a 32-year-old male 
physician who was admitted with typical symptoms of SARS 
to the isolation ward of the Frankfurt University hospital 
on 15 March 2003 (Drosten et al., 2003). The virus was 
propagated in African green monkey kidney (Vero and Vero 
E6) cells, poly(A)-containing RNA was isolated and used for 
reverse transcription and PCR amplification with primers 
derived from the SARS-CoV TOR2 sequence. The exact 
genome termini were determined by 5’ and 3’ RACE 
methods. The complete sequence encompasses 29727 nt 
[excluding the 3’ poly(A) tail] and has been deposited with 
the GenBank database (accession no. AY291315). Comparison 
of the genomic sequence of SARS-CoV Frankfurt 1 with 17 
previously sequenced SARS-CoV isolates (TOR2, AY274119.3; 
Urbani, AY278741.1; CUHK-W1, AY278554.2; BJO1, 
AY278488.2; HKU-39849, AY278491.2; BJO2, AY278487.3; 
BJ03, AY278490.3; BJ04, AY279354.2; GZ01, AY278489.2; 
SIN2679, AY283796.1; SIN2500, AY283794.1; ZJO1, AY297028.1; 
SIN2677, AY283795.1; TW1, AY291451.1; CUHK-Sul0, 
AY282752.1; SIN2748, AY283797.1; SIN2774, AY283798.1) 
revealed four nucleotide exchanges that have not been 
described previously for other SARS-CoV isolates and thus 
seem to be specifically linked to the Frankfurt 1 isolate 
(A2557, U11448, U24933, U28268). Three of these (may) 
lead to amino acid substitutions (assuming all predicted 
ORFs are expressed). Interestingly, the sequence analysis 
also revealed that, upon SARS-CoV propagation in cell 
culture (starting with passage 3), a virus variant emerged in 
which a 45 nt, in-frame sequence (27670-27714) was 
deleted from ORF7b, thus reducing the genome size to 
29 682 nt. The deletion provides evidence that SARS-CoV 
may undergo rapid adaptation in cell culture. The biological 
significance of this observation remains to be investigated 
in detail. 


Subgenomic mRNA synthesis 


Coronaviruses (and arteriviruses) use a unique strategy to 
synthesize a set of subgenomic RNAs with common 5’ and 
3’ sequences (Fig. 1) (Lai & Holmes, 2001; Siddell, 1995; 
Pasternak et al., 2001; Sawicki & Sawicki, 1998). Each mRNA 
contains a short 5’-terminal ‘leader’ sequence derived from 
the 5’ end of the genome. The fusion of the noncontiguous 
sequences is currently believed to be achieved by a dis- 
continuous step during minus-strand synthesis and involves 
transcription regulatory sequences (TRSs). In addition to 
the TRS at the 3’ end of the leader sequence (leader TRS), 
TRSs are located upstream of the genes in the 3'-proximal 
part of the genome (body TRSs) (Lai & Holmes, 2001; 
Siddell, 1995; Sawicki & Sawicki, 1998). To confirm that 
SARS-CoV also uses this discontinuous transcription strategy 
and to elucidate the molecular details of the resulting 
subgenomic RNAs, we analysed intracellular RNA synthesis 
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by Northern blotting and determined the SARS-CoV mRNA 
sequences at the sites where the common 5’ leader is fused 
to the various 3’ ‘body’ sequences, thus generating the 
subgenomic RNAs identified in SARS-CoV-infected cells. 
A Northern blot analysis using poly(A)-containing RNA 
isolated from SARS-CoV-infected cells and a probe specific 
for the 3’-proximal 794 nt revealed the synthesis of as many 
as nine RNAs, with RNA 1 representing the viral genome of 
29-7 kb. The sizes of the subgenomic mRNAs were assessed 
using the previously characterized HCoV-229E RNAs (Thiel 
et al., 2001) as markers. To provide conclusive evidence for 
the presence of common 5’ leader sequences in each of the 
SARS-CoV mRNAs and to determine the leader-to-body 
fusion sites precisely, the 5’-proximal regions of mRNAs 2 to 
9 were amplified by RT-PCR and sequenced. The amplifi- 
cation strategy used in these experiments is illustrated for 
subgenomic mRNA 3 in Fig. 1(d). In some cases we obtained, 
in addition to the expected RT-PCR product for a given 
mRNA, larger PCR products that corresponded to the 
expected RT-PCR products for the next largest subgenomic 
mRNAs. Sequence analysis of these products confirmed 
their identity unambiguously. The data obtained by RT- 
PCR amplification, sequence analysis and Northern blotting 
consistently suggest that SARS-CoV produces eight sub- 
genomic mRNAs. Furthermore, the study revealed that a 
minimal consensus sequence, 5’-ACGAAC-3’, is sufficient 
to direct the synthesis of SARS-CoV subgenomic mRNAs, 
most probably by base-pairing of its negative-stranded 
counterpart to the leader TRS during minus-strand syn- 
thesis. The number of identical nucleotides in leader TRS 
and body TRS regions varies between 6 and 11 (Fig. 1c), but 
there is no clear correlation between the extent of sequence 
complementarity and abundance of a given mRNA (Fig. 1b 
and 1c), indicating that additional factors (such as sequence 
elements, RNA structures, proteins) are involved in regu- 
lating the relative abundance of viral mRNAs. It is tempting 
to speculate that the transcription mechanism used by 
coronaviruses, arteriviruses and (in part) toroviruses 
(van Vliet et al., 2002) has evolved to allow the production 
of a large set of structural and nonstructural (some of them 
probably virulence-associated) proteins (de Haan et al., 
2002), whose abundance can be regulated at the transcrip- 
tional level. Regulation of coronavirus gene expression can 
be even further extended by the presence of additional, 
downstream ORFs in the 5’ unique regions of some of the 
subgenomic mRNAs. These are generally expressed by leaky 
scanning of ribosomes or internal ribosomal entry (Lai & 
Holmes, 2001; Siddell, 1995; Thiel et al., 1994). As shown in 
Fig. 1, SARS-CoV also produces four subgenomic RNAs 
(mRNA 3, 7, 8, 9) with downstream ORFs in their unique 
regions. The functions of the corresponding SARS-CoV 
gene products remain to be characterized. The observed 
45 nt deletion in the putative ORF7b (see above) appears to 
suggest that at least one of these gene products is dispensable 
in cell culture. However, it cannot be excluded that the 15 aa 
deletion from the ORF7b gene product gives rise to an active 
(or partially active) protein. 


Translation 


The structures of SARS-CoV mRNAs (Fig. 1) lead us to 
suggest that five of the nine SARS-CoV RNAs are func- 
tionally bicistronic. For most of them, the mechanisms used 
to express the downstream ORFs remain to be determined. 
On the basis of the available data for other coronaviruses 
(Brierley et al., 1989; Eleouet et al., 1995; Herold et al., 1993; 
Kocherhans ef al., 2001), it seemed likely that ORF1b 
expression from the genomic RNA would involve —1 
ribosomal frameshifting, a process that essentially depends 
on two elements, known as the ‘slippery’ sequence, i.e. the 
site where the ribosomes shift into the —1 reading frame, 
and a complex RNA pseudoknot structure (Brierley et al., 
1989, 1995). Analysis of the SARS-CoV sequence flanking 
the ORFla termination codon revealed a putative SARS- 
CoV frameshifting element comprised of a putative ‘slippery’ 
sequence (133927 JUUAAAC})339g) and, further downstream, 
stretches of complementary sequences that can be modelled 
to form a typical pseudoknot structure (Fig. 2a) (Brierley 
et al., 1995). To confirm that these elements mediate frame- 
shifting in SARS-CoV and to define the frameshift site 
precisely, we synthesized RNAs containing the SARS-CoV 
frameshift region and produced a mutant version of 
the presumed ‘slippery’ sequence (13392. JUUAAAC)339s—> 
13392 UGUAGCC}j339g). As shown in Fig. 2(b), efficient 
ribosomal frameshifting depended on an essentially unmo- 
dified 13397 JUUAAAC})339g sequence, supporting the pre- 
diction that this sequence constitutes the actual slippage site. 


Polyprotein processing by PL2?'° 


Translation of ppla and its C-terminally extended version, 
pplab (see above), is coupled with extensive proteolytic 
processing by viral proteinases (Fig. 3). Except for infectious 
bronchitis virus (IBV) (Liu et al., 1995; Lim et al., 2000), all 
previously characterized coronaviruses encode two (para- 
logous) papain-like cysteine proteinases (PL1°"° and PL2?"°), 
which cleave the N-proximal polyprotein regions at three 
sites (Gorbalenya et al., 1991; Bonilla et al., 1997; Baker et al., 
1989; Kanjanahaluethai & Baker, 2000; Ziebuhr et al., 2001; 
Herold et al., 1998), while the 3C-like cysteine proteinase 
(3CLP"°; also called main proteinase, MP"®) cleaves the central 
and C-proximal regions at 11 conserved sites (Ziebuhr et al., 
2000). The conservation of both the positions and sequ- 
ences of coronavirus ppla/pplab cleavage sites allows the 
substrate specificities of proteinases of newly identified 
coronaviruses to be predicted (Fig. 3). Thus, it has been 
proposed that SARS-CoV PL2P"° cleaves three sites in the 
N-proximal region of ppla/pp lab (Snijder et al., 2003). This 
is similar to IBV, where one proteinase, PL2?"°, cleaves two 
sites in this region, but stands in contrast to other coro- 
naviruses (e.g. HCoV-229E and MHV), which cleave three 
sites by using two PL"® activities, PL1P"° and PL2?"°. To 
establish the proteolytic activity of the SARS-CoV PL2P"° 
domain at one of the predicted sites, we expressed, by 
in vitro translation, the SARS-CoV pp1la/pplab amino acids 
737-1858 along with a mutant construct in which the 


http://vir.sgmjournals.org 


2309 


V. Thiel and others 


a b 
3" koa M 1 2 
oes | 46 
Gc C¢ 
stem 2 Gc G 
G=-¢ UV 


loop 2 


= " 

= shifted 

een (38 kDa) 
30 , 


é—-c 
A-—U 
A—U 
o U-A 
G—c 
ucuaccc|] S ga stem 
G—c 
G—C 
C—G 
G—U 
U-G 


* ** 


13392 13398 
— 13435 


| | 
5 '—UUUAAACGGG 


"slippery" sequence not shifted 


(16 kDa) 


14 


Fig. 2. SARS-CoV RNA-mediated ribosomal frameshifting. (a) Model of the SARS-CoV ribosomal frameshifting element 
which, by analogy with other coronaviruses, is proposed to consist of a putative pseudoknot structure comprising two stems 
and two loops and a ‘slippery’ sequence (13392 JUUAAACi3398, underlined). Also shown are nucleotide changes that were 
introduced into the ‘slippery’ sequence to test its role in SARS-CoV RNA-mediated ribosomal frameshifting. (b) Functional 
analysis of ribosomal frameshifting. Synthetic RNAs corresponding to SARS-CoV nt 12955-13961 with the authentic 
(43392 JUUAAAC 3398, lane 1) or mutagenized (13392 JGUAGCCj 339s, lane 2) putative SARS-CoV ‘slippery’ sequence were 
translated in vitro using rabbit reticulocyte lysate. The sizes of translation products calculated for non-shifted ORF1a-encoded 
and shifted ORF1a/1b-encoded translation products are given. 
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Fig. 3. SARS-CoV enzymatic activities characterized in this study. The two SARS-CoV replicase polyproteins, pp1a and pp1ab, 
are shown together with the papain-like proteinase 2 (PL2°"°), 3C-like proteinase (8CL""°) and helicase (HEL) domains charac- 
terized in this study. The positions of cleavage sites predicted to be processed by PL2°"° (blue) and 3CL”” (red) are indicated. 
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Fig. 4. Proteolytic activity of the SARS-CoV papain-like cysteine proteinase 2. (a) Representation of the SARS-CoV pp1a/ 
pptab cleavage sites predicted to by cleaved by PL2°"° (shown in blue) or 3CL°° (red). Also shown is the structure of the 
in vitro-translated protein that contains the putative PL2°° domain and the predicted nsp2|nsp3 site. (b) /n vitro translation of 
the SARS-CoV pp1a/pp1ab 737-1858 residues. Lane 1, 737-1858 reaction after 40 min; lane 2, 737-1858 reaction after 
160 min; lane 3, 737-1858 _CA (Cys-1651—Ala) reaction after 40 min; lane 4, 737-1858 _CA reaction after 160 min. 
Full-length proteins (substrates) and the larger cleavage product are indicated by arrows. 


presumed active-site Cys nucleophile of PL2?"° was replaced 
by Ala (C-1651-A). The data shown in Fig. 4 strongly 
suggest that rapid, probably co-translational autoprocessing 
of the wild-type protein occurred. The apparent sizes of the 
major products of ~ 125 (C-1651—A) and ~ 115 kDa (wild- 
type sequence) are consistent with cleavage at the predicted 
site, *'*GlylAla®!” (Snijder et al., 2003). Probably due to the 
presence of only two Met residues in the N-terminal 
cleavage product, the expected N-terminal 9 kDa protein 
could not be detected convincingly. Comparative analysis of 
coronavirus PL?"® cleavage sites revealed that the (pre- 
dicted) SARS-CoV PL2?"° sites are more conserved than the 
equivalent HCoV-229E PL1°" and PL2?”° sites (Fig. 5). The 
poor conservation of HCoV-229E PLP" cleavage sites 
probably results from the overlapping substrate specificities 
of PL1?"° and PL2?", which we reported previously (Ziebuhr 
et al., 2001). Consistent with the hypothesis that conserva- 
tion of only one PL?" domain is linked to an increase in 
specificity, we also find that the IBV PL2?"° cleavage sites 
are much better conserved than the corresponding HCoV- 
229E PL1P™°/PL2P™ sites (Fig. 5). The observed narrow 
substrate specificity of the SARS-CoV PL2?"° will certainly 
facilitate the development of selective proteinase inhibitors. 


P|P’ 
Virus Site 987654321 |123456789 Protease 
HCoV-229E nspl|/nsp2 : EG. NVTYT : PL1/PL2? 
HCoV-229E nsp2|nsp3_ : F = : PL1/PL2 
HCoV-229E nsp3|nsp4 : TSIV. AGHBLTW : PL1?/PL2? 


SCoV-FRA1 nspljnsp2 =: Eé 


NN : PL2? 
SCoV-FRA1L nsp2|nsp3 : FG : BL2? 
SCoV-FRA1L nsp3|nsp4 : TTKIS STCFKL : PL2? 
: tte 


Fig. 5. Alignment of established and predicted HCoV-229E 
PL1P"° and PL2°"°, and SARS-CoV (SCoV) and IBV PL2°" cleav- 
age sites. Note the overlapping substrate specificity of HCoV- 
229E PL1°"° and PL2°™ towards (at least) one cleavage site. 
IBV-B, infectious bronchitis virus strain Beaudette, NC_001451; 
HCoV-229E, human coronavirus 229E, NC_002645. Positions 
with absolute conservation are highlighted in red. Green and 
yellow mark descending levels of conservation using the following 
amino acid similarity groups: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) 
F, W, Y; and (v) I, L, M, V. The (predicted) locations of the scissile 
bonds (|) in the cleavage sites along with the numbers of pp1a/ 
pp1ab residues are indicated. 
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Polyprotein processing by 3CL°'° 


At present the viral main proteinase, 3CL?"°, is the best- 
characterized coronavirus enzyme. Its essential function is 
reflected by the fact that it cleaves as many as 11 sites in the 
replicase polyproteins and also releases the key replicative 
functions, such as the RNA-dependent RNA polymerase and 
the helicase, from the polyprotein precursors (Gorbalenya 
et al., 1989a; Ziebuhr et al., 2000). Furthermore, 3CLP”® is 
the only coronavirus protein for which structure informa- 
tion is available (Anand et al., 2002, 2003). Coronavirus 


3CLP"°s are cysteine proteinases with interesting properties. 
They feature a serine proteinase (chymotrypsin)-like two-f- 
barrel fold but employ a catalytic Cys—His dyad instead of 
the classical Ser-His—Asp triad. Furthermore, they possess a 
third, C-terminal domain composed of five a-helices. Our 
previous work has shown that the 3CL?"° substrate speci- 
ficities are conserved among the three established groups of 
coronaviruses (Hegyi & Ziebuhr, 2002) and comparison of 
the previously known coronavirus cleavage sites with those 
identified in the SARS-CoV polyproteins (Fig. 3; Fig. 6a 


66 3CLP®° cleavage sites of 6 coronaviruses 


11 3CLP®° cleavage sites of SARS-CoV 
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Fig. 6. Conservation of coronavirus main proteinase substrate specificities. Conservation of 3CL°® sites in polyproteins of 
SARS-CoV (b) and six other coronaviruses (a). Two separate multiple, gap-free 18-aa-long alignments including the P9—P9’ 
positions of the sites (presumably) cleaved by the 3CLP’° domains of six coronaviruses (transmissible gastroenteritis virus, 
NC_002306; HCoV-229E; porcine epidemic diarrhoea virus, NC_003436; mouse hepatitis virus A59, NC_001846; bovine 
coronavirus, AF391542; IBV-B) and SARS-CoV were converted into two sequence logo (Schneider & Stevens, 1990) 
presentations. In the logos, the height of each letter (amino acid residue) is proportional to its frequency at the specific position, with 
the highest-frequency residue being on top of the stack. The height of the entire stack is proportional to the information at this 
position which is measured in bits, with the upper limit of information at any position being equal to 4-32 bits. Amino acid residues 
are coloured in the following groups: light-green — S, T, C; orange — N, Q; red — D, E; blue — K, R, H; brown — W, F, Y; black— A, L, |, 
V, M; pink — P; green — G. The most conserved and important positions have relatively high letter heights and are easily recognized, 
along with the individual and group residues occupying them. (c) Representation of the SARS-CoV pp1a/pp1ab cleavage sites 
predicted to be cleaved by PL2P"° (shown in blue) or 3CLP"° (red). Also shown is the structure of the substrate used in the trans- 
cleavage assay. The P1 and P1' residues of predicted cleavage sites are given. SARS-CoV pp1a/pp1 ab residues were translated 
in vitro and incubated for 180 min at 25 °C (lanes 1, 6, 11); or were incubated for 30 min (lanes 2, 7, 12), 60 min (lanes 8, 8, 13) 
and 180 min (lanes 4, 9, 14), respectively, with bacterially expressed SARS-CoV (lanes 2-4), TGEV (lanes 7-9) and HCoV (lanes 
12-14) main proteinases. As controls, the corresponding active-site Cys-substituted SARS-CoV (lane 5), TGEV (lane 10) and 
HCoV (lane 15) 3CL?'’s were incubated for 180 min with the same substrate. 
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and b) suggests conservation of the P4 (Ser, Thr, Val, Pro, 
Ala), P2 (Leu, Ile, Val, Phe, Met), P1 (Gln) and P1’ (Ser, Ala, 
Gly, Asn, Cys) residues, all of which have previously been 
proposed to define the coronavirus 3CL?”® substrate speci- 
ficity (Ziebuhr et al., 2000). In SARS-CoV, the strong 
preference for Leu at the P2 positions of substrates is less 
prominent and the more frequent use of Phe may indicate a 
slightly larger S2 subsite. There are also conservative changes 
in the P4 and P1’ positions of SARS-CoV 3CLP”° substrates. 
To determine whether these subtle variations result in 
differences in substrate specificity, we characterized the 
SARS-CoV 3CL?” specificity in more detail and compared it 
with that of the well-characterized HCoV-229E and TGEV 
enzymes. SARS-CoV 3CLP® was expressed as an MBP 
fusion protein, an approach that had proven suitable for the 
production of active coronavirus 3CLP"°s (Seybert et al., 
1997; Ziebuhr et al., 1997; Hegyi et al., 2002). Following 
factor Xa cleavage of the purified fusion protein, the released 
SARS-CoV 3CLP"® was used in trans-cleavage experiments 
with in vitro-translated substrates (Fig. 6d) or synthetic 
peptides derived from group 1 (TGEV) and group 2 (MHV) 
N-terminal 3CL*® autoprocessing sites (Hegyi & Ziebuhr, 
2002; Seybert et al., 1997) (data not shown). The substrate 
derived from the C-terminal ppla sequence was chosen 
because it contained as many as three cleavage sites, allowing 
side-by-side comparison of cleavage efficiencies at different 
sites. Furthermore, the fact that the proteolytic processing of 
this region has already been studied extensively in other 
coronaviruses (Ziebuhr & Siddell, 1999; Bost et al., 2000) 
allowed safe predictions on SARS-CoV 3CLP"° cleavage 
products in this region. As shown in Fig. 6(d), the cleavage 
products (and several intermediate products) predicted to 
be released by 3CL?"® cleavage (Fig. 3; Snijder et al., 2003) 
were readily detectable upon incubation of the translation 
product with purified SARS-CoV 3CL?"®, but not with the 
active-site Cys-to-Ala mutant. Significantly, nearly identical 
cleavage kinetics were observed when purified TGEV and 
HCoV-229E 3CLP"s (instead of SARS-CoV 3CLP"°) were 
incubated with the SARS-CoV-derived substrate. Again, as 
expected, no cleavage was observed with the active-site 
Cys-to-Ala (Val) mutants. Consistent data were also 
obtained in peptide cleavage assays, where the SARS-CoV, 
HCoV-229E and TGEV enzymes were shown to cleave 
TGEV and MHV substrates with equal efficiencies (data not 
shown). The data conclusively demonstrate the proteolytic 
activity of the expressed SARS-CoV protein and, even more 
importantly, suggest conservation of coronavirus 3CLP"° 
substrate specificities. 


Essential proteins involved in replication and 
transcription 


Characterization of individual protein functions involved in 
coronavirus replication and transcription is still at a very 
early stage. Thus, for example, there is essentially no experi- 
mental data on coronavirus RNA polymerase activities. 
Only the helicase has been characterized to some extent 
(Seybert et al., 2000a). On the basis of seven conserved 


sequence motifs, the coronavirus enzyme has been classified 
as belonging to the helicase superfamily 1 (Gorbalenya et al., 
1989b). We previously reported that helicases of HCoV- 
229E and the arterivirus equine arteritis virus (EAV) possess 
polynucleotide-stimulated NTPase activity and 5’-to-3’ 
RNA and DNA duplex-unwinding activities (Seybert et al., 
2000a, b). Both coronavirus and arterivirus helicases require 
an N-terminal (putative) metal-binding domain, consisting 
of at least 12 conserved Cys and His residues, for activity. 
The domain is essential for replication, transcription and 
virion morphogenesis (as demonstrated for EAV) (van 
Dinten et al., 2000). The SARS-CoV helicase is predicted to 
be released from pplab by 3CL?"°-mediated cleavages of 
the *°°'GInIAla”*”’ and *’°GInIAla°’’’ dipeptide bonds 
(Fig. 3; Snijder et al., 2003). We therefore expressed SARS- 
CoV pplab residues 5302-5902, representing the 67 kDa 
helicase together with its N-terminal metal-binding domain, 
in a bacterial expression system. A fusion protein, consisting 
of MBP and the SARS-CoV helicase domain, was purified by 
amylose affinity chromatography and used in ATPase and 
duplex-unwinding assays. Fig. 7(a) shows that this fusion 
protein (MBP-HEL), but not a control protein (MBP- 
HEL_KA) in which the conserved Lys residue of the Walker 
A box (Walker et al., 1982) was replaced by Ala, had ATPase 
activity that could be stimulated by poly(U). The protein 
also had DNA duplex-unwinding activity (Fig. 7b). This is 
consistent with the previously characterized HCoV-229E 
helicase (Seybert et al., 2000a), which also had revealed a 
high promiscuity with respect to the substrates used. Thus, 
both DNA and RNA duplexes were found to be unwound 
by the HCoV-229E helicase with similar efficiency and all 
types of nucleotide and ribonucleotide cofactors were used 
(Seybert et al., 2000a; K. A. Ivanov, V. Thiel & J. Ziebuhr, 
unpublished data). 


Conclusion 


The study provides the first experimental data on both 
mechanisms and enzymes involved in SARS coronavirus 
genome expression and extends our understanding of the 
phylogenetic relationship between SARS-CoV and other 
coronaviruses. Given the unresolved global threat of the 
SARS-CoV epidemic, the rapid development of efficacious 
antiviral drugs is urgently needed. From this perspective, 
our results provide insights into key replicative enzymes 
which represent attractive targets for antiviral therapy. The 
methods established here for the large-scale production of 
recombinantly expressed, active SARS-CoV enzymes pave 
the way for high-throughput screening approaches to 
identify candidate inhibitors in large compound libraries. 
Thus, for example, the DNA duplex-unwinding activity 
of the SARS-CoV helicase should allow the use of high- 
throughput DNA (instead of RNA)-based helicase assays, 
which will significantly facilitate the search for (currently 
not evident) inhibitors. Furthermore, the SARS-CoV PL2?”° 
is shown to possess an (among coronaviruses PLP"°s) 
unusually narrow substrate specificity, making this protein 
another suitable target for rational drug design. At present, 
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Fig. 7. ATPase and duplex-unwinding activities of the SARS- 
CoV superfamily 1 helicase. (a) ATPase activity was analysed 
by thin-layer chromatography using [«-°?PJATP as a substrate. 
Lanes: 1, reaction without protein; 2, with MBP-HEL_KA; 
3, with MBP-HEL; 4, with MBP-HEL and 1 uM poly(U)s50. 
(b) Duplex-unwinding activity was analysed using a twin-tailed 
(‘forked’) DNA substrate consisting of a 22 bp DNA duplex 
with 30-nt-long, single-stranded oligo(dT) tails (see Methods). 
Lanes: 1, substrate incubated with buffer; 2, heat-denatured 
substrate; 3-5, reactions with 8nM (lane 4), 80nM 
(lane 5), 800 nM (lane 6) MBP-HEL; 6, reaction with 800 nM 
MBP-HEL_KA. 


the SARS-CoV 3CL?”® represents the most promising target. 
Both the availability of crystal structures for TGEV and 
HCoV-229E coronavirus 3CL?"° enzymes, which share their 
substrate specificities with that of SARS-CoV 3CLP°, and 
the availability of active SARS-CoV 3CLP"° for high- 
throughput screening assays, should provide an excellent 
basis for the rapid identification of candidate drugs suitable 
for SARS therapy. 
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